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Building  on  the  work  of  Iftimie  et  al.  [J.  Chem.  Phys.  113,  4852  (2000)]  and  Gelb  [J.  Chem.  Phys. 
118,  7747  (2003)],  Boltzmann  sampling  of  an  approximate  potential  (the  “reference”  system)  is 
used  to  build  a  Markov  chain  in  the  isothermal-isobaric  ensemble.  At  the  end  points  of  the  chain,  the 
energy  is  evaluated  at  a  more  accurate  level  (the  “full”  system)  and  a  composite  move  encompassing 
all  of  the  intervening  steps  is  accepted  on  the  basis  of  a  modified  Metropolis  criterion.  For  reference 
system  chains  of  sufficient  length,  consecutive  full  energies  are  statistically  decorrelated  and  thus  far 
fewer  are  required  to  build  ensemble  averages  with  a  given  variance.  Without  modifying  the  original 
algorithm,  however,  the  maximum  reference  chain  length  is  too  short  to  decorrelate  full 
configurations  without  dramatically  lowering  the  acceptance  probability  of  the  composite  move. 
This  difficulty  stems  from  the  fact  that  the  reference  and  full  potentials  sample  different  statistical 
distributions.  By  manipulating  the  thermodynamic  variables  characterizing  the  reference  system 
(pressure  and  temperature,  in  this  case),  we  maximize  the  average  acceptance  probability  of 
composite  moves,  lengthening  significantly  the  random  walk  between  consecutive  full  energy 
evaluations.  In  this  manner,  the  number  of  full  energy  evaluations  needed  to  precisely  characterize 
equilibrium  properties  is  dramatically  reduced.  The  method  is  applied  to  a  model  fluid,  but 
implications  for  sampling  high-dimensional  systems  with  ab  initio  or  density  functional  theory 
potentials  are  discussed.  ©  2009  American  Institute  of  Physics.  [DOI:  10.1063/1.3116788] 


I.  INTRODUCTION 

Characterization  of  thermodynamic  equilibrium  using 
Markov  chain  Monte  Carlo  (MC)2  methods  is  now  well- 
established  practice. 1-4  Instead  of  building  time  averages  for 
an  ensemble  of  trajectories,  as  in  molecular  dynamics 
(MD),1'2'5  configurational  integrals  are  sampled  directly  at 
points  dictated  by  a  random  walk.  New  points  are  added  to 
the  Markov  chain  on  the  basis  of  an  acceptance  criterion, 
most  often  that  of  Metropolis,6  and  the  simulation  is  com¬ 
plete  when  variance  in  (thermodynamic)  ensemble  averages 
has  dropped  to  an  acceptable  level.  This  level  varies  inevita¬ 
bly  with  application,  but  the  number  of  steps  required  to 
achieve  a  target  variance  generally  rises  with  the  dimension¬ 
ality  of  configuration  space.  For  this  reason,  precision  sam¬ 
pling  of  high-dimensional  systems  remains  a  serious  chal¬ 
lenge. 

Methodological  improvements  in  solving  the  electronic 
Schrodinger  equation,  coupled  with  steady  advances  in  com¬ 
puting  power,  have  made  single-point  calculation  of  ab 
initio1  (AI)  or  density  functional  theory8  (DFT)  energies  rou¬ 
tine  even  for  very  large  systems.7  Paired  with  algorithms  for 
extracting  forces  from  wave  functions  (or  densities) 
analytically,9  these  improvements  lead  directly  to  steady 
growth  in  the  application  of  AIMD.10  The  potential  energy 
surface  (PES)  in  AIMD  is  built  “on  the  fly”  using  quantum 
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chemistry  in  place  of  ad  hoc  functional  forms,  permitting 
more  robust  and  accurate  sampling  of  phase  space.  Expanded 
use  of  AIMD  has  not  been  matched,  however,  by  commen¬ 
surate  growth  of  AI  (MC)2,  although  the  use  of  AI  potentials 
in  (MC)2  simulation  has  been  the  subject  of  very  recent 
attention.111-  While  MD  steps  are  collective  and  determinis¬ 
tic,  standard  (MC)2  steps  are  individual  and  stochastic;  the 
computational  exchange  made  in  substituting  MD  for  (MC)2 
is  that  of  force  calculation  at  every  time  step  in  return  for 
steps  encompassing  all  particles  in  the  system.  The  single¬ 
particle  character  of  standard  (MC)2  steps  can  be  exploited  to 
lower  their  cost  from  0{N2)  to  0(N)  in  a  system  of  N  par¬ 
ticles  described  by  a  pair  potential,  but  no  analogous  reduc¬ 
tion  is  afforded  self-consistent  potentials  including  interac¬ 
tion  levels  much  higher  than  the  pair.  It  remains  true, 
however,  that  (MC)2  possesses  inherent  flexibility  unavail- 
able  to  MD,  such  as  constant  temperature  or  pressure  sam¬ 
pling  without  need  of  a  stochastic  bath,  or  chemical  equilib¬ 
rium  sampling  without  need  of  a  reactive  potential  surface.14 
For  these  reasons,  it  is  worthwhile  to  explore  (MC)2  algo¬ 
rithms  harnessing  the  accuracy  of  AI  quantum  chemistry 
without  requiring  full  system  energy  evaluation  following 
every  single-particle  displacement. 

One  alternative  is  to  build  trial  moves  from  collective 
displacement  of  several  or  even  all  particles.  The  acceptance 
probability  of  a  collective  move  will  be  much  lower  than  that 
of  its  constituents  taken  independently,  however,  because 
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single-particle  steps  are  chosen  randomly  and  thus  lack  in¬ 
formation  regarding  intermolecular  forces  (except  for  vari¬ 
ants  such  as  force  bias15  and  “smart”16  MC).  This  fact  is 
illustrated  clearly  in  a  hard-sphere  fluid,  where  the  likelihood 
that  two  particles  will  overlap  increases  monotonically  with 
the  number  of  particles  displaced;  if  a  collective  step  yields 
even  a  single  overlap,  its  acceptance  probability  will  vanish 
entirely.  This  will  result  in  many  wasted  trial  steps,  a  weighty 
consideration  if  each  acceptance  test  requires  significant 
computing  time.  The  radius  of  trial  moves  could  be  dramati¬ 
cally  reduced  in  order  to  salvage  the  acceptance  probability, 
but  only  at  the  expense  of  slow  configuration  space  explora¬ 
tion;  as  before,  precise  equilibrium  averages  will  require 
many  energy  evaluations.  In  this  sense  MD  steps  can  be 
viewed  as  “directed”  forms  of  multiparticle  (MC)2  moves,  in 
that  time-reversible  integration  of  the  equations  of  motion 
guarantees  energy  conservation  and  thus  unit  acceptance 
probability  of  the  “trial  move.”  No  such  guarantee  exists 
when  trial  moves  are  chosen  stochastically. 

An  alternative  means  of  building  (MC)2  steps  was  intro- 
duced  recently  by  Iftimie  et  al.,  followed  by  an  indepen- 
dent  treatment  from  Gelb.  Although  several  monikers  have 
been  applied,16  we  will  refer  to  this  procedure  as  nested 
Markov  chain  MC  (N(MC)2).  The  method  is  conceptually 
related  to  (MC)2  with  stochastic  potential  switching,20  mul- 
tiple  “time  steps,”"  multilevel  summation,  "  resolution 
exchange,"3  and  hybrid  replica  exchange."4  In  N(MC)2  a  se¬ 
ries  of  elementary  moves  (in  the  NPT  ensemble,  single  par¬ 
ticle  or  volume  adjustments),  each  accepted  with  Boltzmann 
weight,  is  made  in  a  “reference”  system  defined  by  an  inex¬ 
pensive  (but  less  accurate)  potential.  At  the  end  points  of  this 
sequence,  the  energy  is  evaluated  again  with  a  more  accurate 
potential  defining  the  “full”  system.  Through  appropriate 
modification  of  the  acceptance  criterion,  the  reference  system 
Markov  chain  is  transformed  into  a  composite  trial  step  ac¬ 
cepted  with  Boltzmann  weight  in  the  full  system.  As  long  as 
the  reference  potential  captures  adequately  the  physics  of  the 
full  potential,  these  composite  trial  moves  retain  a  reasonable 
probability  of  acceptance;  the  more  reference  steps  compris¬ 
ing  a  composite  move  (or  the  less  capably  the  reference  po¬ 
tential  captures  the  interactions  present  in  the  full  potential), 
the  lower  its  acceptance  probability.  The  difficulty  is  that  the 
reference  and  full  potentials  sample  different  statistical  dis¬ 
tributions,  and  so  the  number  of  reference  steps  combinable 
into  a  composite  step  is  strongly  limited  by  the  practical  need 
of  a  reasonable  acceptance  probability  for  the  latter.  In  spite 
of  this  difficulty,  N(MC)2  permits  (MC)2  sampling  of  an  ac¬ 
curate  potential  without  having  to  evaluate  it  following  every 
single-particle  displacement,  and  in  this  sense  represents  an 
important  step  toward  realistic  implementation  of  (MC)2 
with  an  AI  potential.  Although  its  application  already  has 


12  25 

been  fairly  extensive,  "’"  the  present  work  attempts  to  im¬ 
prove  upon  the  original  N(MC)2  algorithm  by  addressing  its 
principal  weakness;  namely,  the  potentially  poor  overlap  of 
reference  and  full  distributions. 

In  order  to  minimize  the  number  of  full  energy  evalua¬ 
tions  required  to  achieve  target  variance  in  ensemble  aver¬ 
ages,  configurations  at  which  the  full  energy  is  evaluated 
should  be  as  decorrelated  ( vide  infra )  as  possible.  Decorre¬ 
lation  requires  separation  by  a  large  number  of  reference 
steps,  a  number  constrained  also  by  the  acceptance  probabil¬ 
ity  for  the  composite  step.  By  manipulating  the  thermody¬ 
namic  variables  of  the  reference  system,  we  show  how  to 
maximize  the  overlap  of  reference  and  full  distributions.  This 
procedure  maximizes  also  the  acceptance  probability  for 
composite  steps  built  from  a  fixed  number  of  reference  steps, 
minimizes  the  correlation  of  energies  sampled  in  the  full  sys¬ 
tem,  and  thereby  lowers  considerably  the  number  of  full  en¬ 
ergy  evaluations  needed  to  sample  with  high  precision. 

Section  II  describes  the  potentials  used  to  generate  the 
results  that  follow.  The  next  sections  provide  a  brief  over¬ 
view  of  conventional  MC  sampling  (III)  and  basic  N(MC)2 
(IV).  Section  V  contains  our  primary  contribution,  wherein 
we  outline  a  means  of  optimizing  N(MC)2  sampling  effi¬ 
ciency.  Section  VI  summarizes  and  offers  some  suggestions 
for  further  development. 

II.  POTENTIAL  MODEL 

The  N(MC)2  procedure  evaluates  the  energy  of  a  con¬ 
figuration  using  two  different  potentials:  an  approximate  po¬ 
tential  for  single-particle  steps  and  a  more  accurate  one  for 
composite  steps.  We  assume  that  quantities  for  comparison 
with  experiment  are  computed  in  the  full  system.  In  the 
present  work,  the  purpose  of  which  is  to  test  and  optimize 
the  procedure,  we  will  utilize  combinations  of  pair  potentials 
equal  in  computational  expense  but  differing  in  their  param- 
etrization.  In  the  future,  a  model  potential  will  be  used  as 
reference  for  a  full  system  characterized  with  DFT."6 

The  model  potentials  used  below  describe  diatomic  mol¬ 
ecules  of  fixed  bond  length  and  with  interaction  sites  at  their 
atomic  centers.  Pairwise  interaction  of  atomic  sites  a  and  b 
are  modeled  with  the  Buckingham  exponential-6  potential. 


<p(rab)  =  -^—{  6ea(lr^--^-\  (1) 

«-6V  rabl 
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TLX 


FIG.  1 .  Diatomic  pair  configurations  used  to  illustrate  the  anisotropy  of  the 
potential  defined  by  Eqs.  (l)-(4)  and  its  variation  with  intramolecular  bond 
length  /  in  Fig.  2.  Bond  lengths  are  fixed  at  1.10  A  in  the  full  system,  but 
varied  from  0.90  to  1.05  A  in  the  reference  systems.  Configuration  types  T, 
L.  and  X  are  characterized  by  the  four  angles  (0x,X\,  621X2)’  where  6  and  x 
represent  rotation  of  molecules  1  and  2  within  and  out  of  the  plane  of  the 
page,  respectively.  Diatomic  bond  vectors  in  the  T  and  L  configurations 
therefore  are  coplanar,  whereas  those  in  the  X  configuration  are  perpendicu¬ 
lar.  Clockwise  rotations  are  positive,  and  all  angle  values  are  zeroed  to  those 
in  the  parallel  configuration  (not  shown). 


The  site-site  separation  distance  rab  has  been  expressed  in 
terms  of  the  center-of-mass  (COM)  separation  vector  (r/7)  for 
interacting  molecules  i  and  j ,  and  the  individual  bond  vectors 
1,  and  1 j  (of  lengths  /,  and  Ij).  The  full  interaction  of  two 
diatomics  is  then 
2  2 

(3) 

0=1  b=  1 

The  potential  parameters  a,  e,  and  r0  were  chosen  to  roughly 
approximate  compressed  nitrogen  fluid  on  its  shock 
Hugoniot  locus.  Details  of  the  fitting  procedure  used  to 
determine  these  parameters  will  be  described  in  an  upcoming 
publication."6  The  final  values  are 

e  =  34.156  K, 

r0=  4.037  A,  (4) 

a=  12.29. 

We  enforced  a  minimum  allowable  rab  slightly  greater  than 
the  N2  bond  length  (r™n=1.20  A)  as  a  guarantee  of  smooth 
behavior  throughout  the  simulations.  In  testing  the  N(MC)2 
procedure,  /,  and  lj  were  fixed  at  1.10  A  in  the  full  system  but 
shortened  in  0.05  A  increments  to  generate  a  series  of  refer¬ 
ence  systems.  The  reference  potential  approaches  a  purely 
spherical  COM  interaction  as  (,—>0,  thus  providing  a  poorer 
approximation  to  the  full  potential.  Because  bond  lengths  in 
the  full  system  are  fixed,  and  /,=/;  in  all  reference  systems, 
we  will  refer  only  to  l  (and  always  in  the  context  of  the 
reference  system)  in  what  follows. 

Although  each  site-site  interaction  described  by  Eq.  (1) 
is  spherical,  the  sum  of  these  interactions  (3)  for  a  pair  of 
molecules  is  highly  anisotropic.  Figure  1  schematically  illus¬ 
trates  some  of  the  quantities  appearing  in  Eqs.  (1)  and  (2)  for 
a  pair  of  molecules.  The  pair  is  drawn  in  three  fiducial 
configurations"6  labeled  T,  L,  and  X,  defined  by  the  quartet 
of  angles  {0x,X\,02,Xt)-  @(x)  's  the  angle  in  (out  of)  the 
plane  of  the  paper,  and  angles  are  zeroed  to  the  configuration 
in  which  the  molecules  are  parallel  to  one  another  (not 
shown).  The  subscripts  label  the  molecules.  Figure  2  displays 
the  variation  in  potential  for  each  of  these  configurations  as  a 
function  of  COM  separation  and  bond  length  /.  The  full 


FIG.  2.  Illustration  of  the  pair  potential  defined  by  Eqs.  ( 1 ) — (4)  as  a  function 
of  the  COM  separation  r,  .  The  full  potential  (/=  1.10  A)  is  compared  to  a 
spherical  potential  (/= 0,  where  all  configurations  are  equivalent),  as  well  as 
to  the  reference  potential  with  the  shortest  bond  length  (1=0.90  A).  Varia¬ 
tions  in  potential  <prj  are  plotted  on  a  log  scale,  revealing  its  highly  aniso¬ 
tropic  character.  See  Fig.  1  for  an  explanation  of  the  X.  T.  and  L 
configurations. 

potential  (/=  1.10  A)  is  compared  with  a  purely  spherical 
potential  (/=0)  and  one  of  the  reference  potentials 
(/= 0.90  A)  used  below.  The  ordinate  is  drawn  on  a  log 
scale,  and  it  is  clear  that  the  reference  potential  may  differ 
substantially  both  from  the  full  potential  and  from  that  of  a 
purely  isotropic  interaction. 


III.  STANDARD  MC  SAMPLING 

In  keeping  with  an  earlier  presentation  of  the  N(MC)2 
method,  we  have  adopted  the  structure  and  notation  of  Ref. 
29  to  describe  MC  sampling.  Matrices  are  indicated  by  bold 
lettering,  and  their  individual  elements  by  a  subscripted, 
italicized  form  of  the  same  symbol.  The  system  is  described 
by  a  state  vector  77,  each  element  of  which  defines  the  prob¬ 
ability  that  the  system  is  in  state  7r;.  These  probabilities  vary 
as  steps  are  added  to  the  Markov  chain,  a  process  performed 
by  repeated  application  of  the  transition  matrix  p  to  the  state 
vector  77, 

77r(n)p  =  77  T(n  +  1) .  (5) 

Here  we  have  written  the  states  as  transposed  column  vectors 
and  indicated  the  step  number  by  n.  Following  Metropolis,6 
we  demand  that  the  77,  be  asymptotically  distributed  accord¬ 
ing  to  their  Boltzmann  weights, 

77 *  °c  eWi,  (6) 

where 

77*  =  lim  7r(l)p".  (7) 

n— >00 

30 

In  the  isothermal-isobaric  ensemble,  for  which  the  corre- 

31 

sponding  potential  is  the  Gibbs  free  energy,' 

Wi  =  -  /3{Ui  +  PV,)  +  N  In  V,.,  (8) 
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Uj  and  Vj  are  the  internal  energy  and  volume  (respectively) 
of  state  i,  N  is  the  total  number  of  atoms  in  the  system,  and 
f3  has  its  usual  meaning  as  the  inverse  product  of  temperature 
with  the  Boltzmann  constant  ( kBT)~l .  A  simple  step  toward 
realization  of  Eq.  (6)  is  construction  of  the  transition  matrix 
p  such  that 

TT*p=TT*,  (9) 

meaning  that  once  reached,  the  limiting  distribution  is  per¬ 
manently  maintained.  This  is  known  as  the  balance  condi¬ 
tion.  The  elements  of  p  define  the  probability  of  transition 
between  various  states,  meaning  that 

PU  s  0  V  i,j.  (10) 

Conservation  of  probability  mandates  that 

2 Pi/=lVi  (11) 

i 

as  well.  Equation  (11)  identifies  p  as  a  stochastic  matrix.  The 
Markov  chain  is  irreducible  (or  ergodic)  if  there  exists  some 
n  such  that 


[p"]y>  0,  V  i,j,  (12) 

establishing  that  any  final  state  can  be  reached  from  any 
initial  state  simply  by  repeated  application  of  p  to  an  (arbi¬ 
trary)  initial  state  vector  tt(1).  If  p  is  stochastic,  irreducible, 
and  aperiodic,  the  Perron-Frobenius  theorem4  ensures  that  it 
possesses  a  single  left  eigenvector  having  unit  eigenvalue, 
and  that  this  eigenvector  represents  the  limiting  distribution. 
This  guarantee  does  not,  however,  ensure  that  the  limiting 
distribution  is  the  Boltzmann  distribution.  For  this,  an  ex¬ 
plicit  form  of  pij  compatible  with  Eq.  (6)  must  be  specified. 

A  helpful  constraint  in  this  regard  is  microscopic  revers¬ 
ibility. 


TTiPij=TTjPji.  (13) 

Although  Eq.  (13)  is  unnecessarily  strong,1'32  its  combination 
with  Eq.  (11)  guarantees  satisfaction  of  Eq.  (9).  We  now 
restrict  py  to  the  product  form 

P,i  =  ‘lii<*ir  (14) 

where  q y  is  the  (marginal)  probability  of  making  a  trial  step 
from  state  i  to  state  j  and  ay  is  the  (conditional)  probability 
of  accepting  such  a  move.  The  average  number  of  systems 
attempting  this  transition  will  be  'nlqip  and  the  average  num¬ 
ber  attempting  the  reverse  transition  will  be  7r;(y;/.  Metropolis 
et  al .6  were  the  first  to  show  that  Eq.  (6)  will  be  satisfied 
when 


■  i  \ 

t/:  =  min  — ,  1  , 

\  q.jTT,  / 


(15) 


if  the  7 Tj  are  defined  as  ew>.  In  the  (very  common)  event  that 
the  marginal  distribution  is  uniform  and  thus  q,,-qy  by  con¬ 
struction,  this  reduces  to 

ay  =  min(ew~wi,l) .  (16) 

The  choice  of  ay  given  in  Eqs.  (15)  and  (16)  satisfies  micro¬ 
scopic  reversibility  as  well,  so  long  as  there  is  a  nonzero 
probability  of  remaining  in  the  same  state, 

Pu  =1-2  Pij  *  0.  (17) 

m 

The  matrix  elements  q  -y  represent  the  probability  of  making  a 
trial  move,  such  as  a  displacement  or  a  volume  change.  For 
single-particle  displacements  limited  to  a  sphere  of  cutoff 
radius  rc,  qy  is  the  uniform  probability  of  choosing  a  trial 
state  j  in  which  a  single  particle  has  been  moved  to  a  differ¬ 
ent  point  within  the  sphere;  this  uniformity  is  what  permits 
reduction  of  Eq.  (15)  to  Eq.  (16).  For  more  sophisticated 
move  types  such  as  the  composite  moves  introduced  below, 
the  distribution  of  trial  moves  may  not  be  uniform,  in  which 
event  q  will  assume  a  more  complicated  form.  In  that  case, 
the  simple  decomposition  of  py  assumed  in  Eq.  (14)  can  be 
leveraged  to  yield  the  new  matrix  a  in  relatively  straightfor¬ 
ward  fashion.  This  procedure  is  illustrated  in  Sec.  IV. 


IV.  NESTED  MARKOV  CHAIN  MC 

The  N(MC)2  procedure  distinguishes  the  full  system  of 
interest  from  a  reference  system  defined  by  an  alternate  po¬ 
tential.  In  what  follows,  reference  system  quantities  will  be 
indicated  with  superscripted  zeros.  Reference  and  full  system 
volumes  are  identical,  so  no  attempt  will  be  made  to  distin¬ 
guish  the  two. 

Consider  a  sequence  of  M  elementary  reference  steps 
connecting  configurations  i  and  j.  Each  of  these  steps  is  ac¬ 
cepted  on  the  basis  of  the  standard  Metropolis  criterion  (15) 
using  reference  system  energies.  We  wish  to  transform  this 
sequence  of  steps  accepted  in  the  reference  system  into  a 
trial  step  made  in  the  full  system.  The  full  system  qy  are  no 
longer  drawn  from  a  uniform  distribution;  rather,  they  are 
built  from  a  sequence  of  M  points  accepted  with  Boltzmann 
weight  in  the  reference  system.  What  is  the  appropriate  form 
of  the  new  ay,  the  acceptance  probability  in  the  full  system? 
The  full  system  qy  are 


a  -  TT  «(0)  -T~Ifl(0)  fl(0) 

Hij  -  1 1  Pk-\,k  ~  11 
k=  1  k=  1 

M  j 

=  II  dk-t  ,k  min 


k=l 


77 k  1 

\  77(0)  (0) 

\  ”k-vik-\,k 


.1  . 


(18) 


meaning  that 
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(19) 


18 

Following  Gelb,  note  that 


U(?)  +  c=U„ 


(24) 


JO) 


-l,*  1 


7 JO)  JO) 

77(0)  fl(0) 

At-m-U 


,1 


JO)  «(0) 

(01  •  I  At-m-l.it  , 

m- 1  mlnl  mi  mi  - 1 


77, 


,(0)  JO) 

it 


Jo) 

"A- 

J°)  - 
Tfc-l 


(20) 


which,  in  combination  with  reordering  of  factors  in  Eq.  (19), 
implies  that 


<h!: 

9ji 


if 

J0) 


X 


T(0) 

~2 

JO) 


X 


JO) 

*3 

JO) 


X 


7# 

%- 1 


if 

J0)' 


Substituting  Eq.  (21)  into  Eq.  (15)  gives 


77;  7rj0) 

“o  =  mml  ^^(O)’1  )• 


(21) 


(22) 


the  acceptance  probability  of  composite  moves  required  for 

Metropolis  sampling  of  the  full  potential.  In  comparing  a,, 
(0)  J 
with  ajj  ,  the  standard  ratio  of  Boltzmann  factors  for  initial 

and  final  states  of  the  full  system  has  been  augmented  by  the 
inverse  of  the  standard  ratio  in  the  reference  system  (to 
which  Gelb  refers  as  a  “correction  factor”).  The  7r[0>  in  Eq. 
(22)  are  evaluated  using  the  reference  system  temperature 
(7^0))  and  pressure  (P1-0'),  but  this  in  no  way  precludes  use  of 
a  different  pressure  (P)  and  temperature  (7)  in  building  7 Tk 
for  the  full  system.  Abbreviating  the  difference  between  full 
and  reference  potentials  for  state  k  as  Wk-  =  SWk  and 
SWj—  SW/=  AW,  Eq.  (22)  can  be  re-expressed  as 


e 


1,  A1T>0 

AIT  <  0, 


(23) 


If  the  reference  energy  always  was  related  to  the  full  energy 
by  a  simple  constant  shift 


the  product  in  Eq.  (22)  would  never  deviate  from  unity,  and 
thus  all  composite  moves  would  be  accepted,  regardless  of 
the  magnitude  of  M.  A  distribution  of  SW  implies  a  distribu¬ 
tion  of  AW,  the  mean  of  which  is  determined  by  the  extent  to 
which  the  reference  potential  deviates  from  the  full  (or  by  the 
number  of  reference  steps  between  full  energy  evaluations). 
Because  a  Dirac  <5(0)  distribution  of  AW  would  yield  unit 
acceptance  probability,  reducing  the  absolute  value  of  the 
first  two  moments  of  the  AW  distribution  raises  the  mean 
value  of  in  Eq.  (23).  These  moments  are  dictated  partly 
by  the  thermodynamic  state  of  the  reference  system,  a  fact 
upon  which  we  build  the  optimization  procedure  described  in 
Sec.  V. 

Unless  otherwise  indicated  all  full  system  results  are  for 
a  3-D  periodic  system  of  100  diatomic  molecules  at  tempera¬ 
ture  7  =  728  K  and  pressure  P=4.84  GPa,  although  our 
methodology  is  in  no  way  restricted  to  such  extreme  envi¬ 
ronments.  After  an  equilibration  period  of  (9(104)  reference 
steps,33  results  were  collected  from  an  additional  <9(107)  ref¬ 
erence  steps  and  averaged  over  5-10  Markov  chains  started 
from  randomly  chosen  initial  conditions. 

The  rate  of  convergence  for  ensemble  averages  depends 
on  the  statistical  independence  of  the  sampling  points  in  a 
sense  now  defined.  The  left  panel  of  Fig.  3  presents  the  dis¬ 
tribution  of  reference  energy  per  particle  u>>)>  as  given  by 
Eqs.  (l)-(4)  and  for  /=  1.00  A,  calculated  at  a  fixed  number 
of  steps  O  from  a  reference  configuration  j.  At  an  offset  O 
=  10  steps,  the  energies  u^Hj)  and  u(0>(j+O)  are  highly  cor¬ 
related  and  thus  the  distribution  is  narrowly  peaked  about 
u(i))(j).  As  the  offset  grows  larger,  the  distribution  widens 
gradually  up  to  <9=3000,  at  which  point  the  distribution 
ceases  to  broaden.  The  right  panel  provides  a  quantitative 
measure  of  this  effect  through  the  standard  deviation  cr  of  the 
Gaussian  distribution.  The  width  of  the  distribution  at  O 


0.12  200 


FIG.  3.  Correlation  of  reference  en¬ 
ergy  per  particle  u (0^  at  configurations 
j  with  those  separated  by  a  fixed  num¬ 
ber  of  steps  O  for  /=  1 .00  A.  The  left 
panel  shows  the  uniform  broadening 
of  the  Gaussian  distribution  as  the  off¬ 
set  rises.  The  right  panel  illustrates 
convergence  of  the  distribution  width, 
as  characterized  by  standard  deviation 
cr  for  several  different  values  of  /.  At 
offset  Ocorr=3000  steps,  the  energies 
are  approximately  decorrelated  and  the 
Gaussian  width  constant. 
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=  4000  is  indistinguishable  from  that  at  0=3000.  In  fact, 
energies  are  correlated  only  slightly  at  0  =  500,  but  a  conser¬ 
vative  estimate  of  0corr  is  made  in  order  to  clarify  the  bench¬ 
marks  provided  below.  We  define  the  correlation  length  0corr, 
then,  to  be  equal  roughly  to  3000  for  this  set  of  reference 
potentials. 

Gelb’s  presentation  of  the  N(MC)2  method ls  suggested  a 
metric  for  evaluating  its  computational  efficiency  (maximum 
speedup),  but  did  not  attempt  to  quantify  its  sampling  effi¬ 
ciency.  Because  the  reference  and  full  potentials  used  here  do 
not  differ  in  computational  expense,  we  reverse  the  emphasis 
and  defer  discussion  of  the  total  efficiency  (some  combina¬ 
tion  of  sampling  and  computational  efficiencies)  for  a  later 
work."6  The  sampling  efficiency  of  the  method  provides  a 
measure  of  the  rate  at  which  it  will  explore  the  relevant 
space.  This  quantity  is  not  determined  by  the  acceptance 
probability  alone,  but  in  balancing  the  need  to  separate  full 
energy  evaluations  by  as  many  reference  steps  as  possible 
(up  to  0corr)  with  maintenance  of  a  reasonable  acceptance 
probability  for  each  composite  move.  In  light  of  these  con¬ 
siderations,  we  define  the  sampling  efficiency  Es  for  a  given 
reference  potential  (characterized  here  by  0corr)  and  offset  0 
as 


Es{0,Oc0n)  = 


a  min(0,0colT) 


(25) 


where  a  is  the  average  acceptance  probability  of  a  composite 
move  from  state  i  to  state  j  when  the  states  are  separated  by 
0  reference  steps,  and  the  min  function  reflects  the  efficiency 
loss  in  increasing  0  beyond  0corr.  The  min  function  really 
should  be  replaced  by  one  passing  smoothly  to  0CO1T,  but  Eq. 
(25)  is  sufficient  for  our  purposes  here.  The  possible  range  of 
Es  as  defined  by  Eq.  (25)  is  [0,1],  and  the  goal  of  the  proce¬ 
dure  introduced  below  will  be  to  maximize  this  quantity 
through  variation  of  a.  If  0  is  large  but  a  is  small,  then 
accepted  composite  steps  will  explore  configuration  space 
rapidly  but  much  computational  effort  will  be  wasted  on  re¬ 
jected  steps;  for  small  0  and  large  a,  composite  steps  will  be 
accepted  with  high  probability  but  little  ground  will  be  cov¬ 
ered  in  the  process. 

We  now  examine  the  performance  of  N(MC)2  for  /  val¬ 
ues  in  the  range  of  0.90-1.05  A.  Figure  4  illustrates  the  ac¬ 
ceptance  probability  and  Fig.  5  the  sampling  efficiency  as 
defined  by  Eq.  (25)  for  0=1-3000.  As  the  reference  poten¬ 
tial  deviates  more  strongly  from  the  full  potential  (i.e.,  as  l 
decreases),  the  performance  of  the  method  deteriorates  rap¬ 
idly,  as  evidenced  by  the  downward  shift  in  both  the  accep¬ 
tance  probability  and  efficiency  curves.  As  is  to  be  expected, 
the  acceptance  probability  also  falls  as  the  magnitude  of  the 
offset  rises.  The  inset  in  Fig.  4  demonstrates  the  ability  to 
obtain  a  good  acceptance  probability  even  with  a  poor  refer¬ 
ence  potential,  albeit  at  the  cost  of  lowering  0  (meaning  that 
a  greater  number  of  total  sampling  points  will  be  required). 
In  this  context  it  is  important  to  emphasize  that  using  0 
values  close  to  0corr  is  ideal,  but  not  at  all  necessary  for 
sampling  the  full  potential  much  more  efficiently  than  with 
conventional  (MC)2.  On  this  point,  note  that  the  efficiency 
using  any  reference  potential  is  minimal  at  0=1,  which  cor¬ 
responds  roughly  to  conventional  (MC)2.  Efficiency  no 
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FIG.  4.  Acceptance  probabilities  obtained  using  the  N(MC)2  procedure  with 
the  potential  defined  by  Eqs.  ( 1 ) — (4) .  Bondlength  /  is  held  fixed  at  1.10  A  in 
the  full  system,  but  varied  from  1.05  to  0.90  A  in  the  reference  systems. 
Acceptance  probabilities  diminish  rapidly  as  the  offset  O  between  full  en¬ 
ergy  evaluations  increases  or  as  the  similarity  of  the  reference  and  full 
potentials  lessens.  For  all  points,  statistical  errors  in  a  are  smaller  than  the 
symbol  size.  The  inset  enlarges  the  behavior  at  small  O,  showing  that  better 
acceptance  probabilities  can  be  obtained  even  with  the  poorest  reference 
potential  by  shortening  the  reference  system  Markov  chain. 


longer  increases  monotonically  with  the  offset  as  the  refer¬ 
ence  potential  deviates  more  strongly  from  the  full  potential; 
results  for  /= 0.95  A  and  /  =  0.90  A  exhibit  maxima  around 
0  =  250  steps.  These  results  will  be  scrutinized  quantitatively 
below,  after  introducing  an  optimized  variant  of  N(MC)2. 
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FIG.  5.  Sampling  efficiencies  Es  corresponding  to  the  acceptance  probabili¬ 
ties  shown  in  Fig.  4  and  as  defined  by  Eq.  (25)  in  the  text.  Local  maxima 
occur  at  small  O  as  the  quality  of  the  reference  potential  deteriorates,  but 
note  efficiencies  are  minimal  at  0  =  1,  corresponding  roughly  to  that  of 
standard  MC.  The  inset  enlarges  the  behavior  at  small  O. 
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V.  OPTIMIZED  N(MC)2  PROCEDURE 


The  average  acceptance  probability  A  for  composite 
steps  connecting  configurations  i  and  j  can  be  expressed  ex¬ 
actly  in  the  limit  that  i  and  j  are  fully  decorrelated. 


A  =  lim 

o^o, 


(26) 


The  initial  states  i  will,  by  construction,  possess  relative 
weights  ew<  drawn  from  the  full  distribution.  The  final  states 

j  will,  in  the  (9corr  limit,  be  drawn  randomly  from  the  refer- 

w(o) 

ence  distribution  and  thus  carry  weights  e  i  .  The  accep¬ 
tance  probability  of  a  composite  trial  step  from  state  i  to  state 
j  is  atj,  and  this  quantity  is  averaged  over  the  configuration 
and  volume  spaces  of  all  decorrelated  (i,j)  pairs  to  obtain  A, 

f  f  f  f  a,yeWi+wi  } dTjdVjdTjdVj 
f  f  f  f  ew'+w!  ] dTidV jdTjdV j 
_  fill  aijeSWi{e^)+w{i)))dTidV jdTjdV  j 

~  ffffe^ie^^dTtdVjTjdVj  ' 

Because  composite  steps  are  built  from  a  sequence  of  el¬ 
ementary  moves  accepted  with  Boltzmann  weight  in  the  ref¬ 
erence  system,  the  terms  appearing  in  parenthesis  in  Eq.  (27) 
are  implicitly  taken  into  account  when  (MC)2  sampling  on 
the  basis  of  the  reference  potential.  Indicating  a  double  av¬ 
erage  over  initial  and  final  states  by  nested  brackets,  Eq.  (27) 
can  be  condensed  as  follows: 


Q 

«em'))0 


(28) 


where  the  subscripted  “0”  indicates  that  the  averaging  is  per¬ 
formed  entirely  in  the  reference  ensemble.  A  can  be  built 
from  Eq.  (28)  by  sampling  SW  at  a  collection  of  decorrelated 
configurations  (each  separated  by  (9  >  (9corr  reference  steps), 
meaning  that  the  sampled  points  will  be  drawn  purely  from 
the  reference  distribution  and  thus  without  application  of  Eq. 
(22).  We  refer  to  this  reference  distribution  sampling  as  the 
“reweighting”  calculation,34’35  and  A  evaluated  on  this  basis 
will  be  denoted  Arw.Arw  constitutes  an  a  priori  estimate  of  A, 
in  the  sense  that  it  provides  an  acceptance  probability  for 
N(MC)2  composite  steps  (but  only  in  the  (9con.  limit)  without 
recourse  to  an  actual  N(MC)2  simulation. 

We  now  step  through  the  procedure  for  performing  an 
optimized  N(MC)2  simulation  at  a  prescribed  set  of  thermo¬ 
dynamic  conditions  (P=P'  ,T=T').  The  reference  (full)  sys¬ 
tem  weights  w[01  ( Wk)  appearing  in  Eq.  (27)  depend  on  the 
reference  (full)  system  temperature  and  pressure  through  Eq. 
(8),  meaning  that  AA/(n(0),  7R.P,  T).  Hereafter,  the  vari¬ 
able  dependencies  of  A  will  be  listed  in  this  order.  From  a 
single  set  of  reference  configurations  collected  in  the  re¬ 
weighting  calculation  at  (P(0),7^0)),  a  family  of  Arw  differing 
only  in  the  values  assigned  to  (P,  T)  can  be  constructed  from 
Eq.  (28).  Because  it  is  the  thermodynamic  state  of  the  full 
system  only  that  we  wish  to  match  with  experiment, 
(p(°\  yf°))  can  be  treated  separately  from  (P,T)  and  the  latter 
varied  as  free  parameters  in  order  to  maximize  Arw  for  a 


given  set  of  configurations.  Previously  4  we  applied  a  similar 
idea  to  the  thermodynamics  of  fluid  N2  as  described  by  DFT, 
but  strictly  in  the  context  of  reweighting  configurations  al¬ 
ready  sampled  using  traditional  (MC)2.  The  reference  system 
parameters  can  be  varied  to  yield  maximal  Arw=A(0)  at  op¬ 
timal  (P(0) = _P^t ,  7^0)  =  T^pj) , 

\,P',T')  =  max{A,.M,(x, y,P,T):P^p  <  x 


<  P^\lf^  —  y  —  lf):P  =  P’,T=  T’}, 


(29) 


where  the  reference  system  (x,y)  =  (P*°\7^0))  has  been 
scanned  over  a  given  domain.  This  approach  is  permissible, 
but  requires  iteratively  resampling  SW  (which  includes 
evaluation  of  the  full  potential).  Alternatively  one  can  satisfy 


Amax(P',r,Popt,Topt)  =  maxRJP^T^x.y):^  =  P’,l<0) 
=  r:P1<x<P2,7’1<y<7’2},  (30) 

using  the  same  set  of  reference  configurations  for  each  (x,y) 
pair.  In  general,  A4),|x  is  a  function  of  (P,T)  and  Amax  is  a 
function  of  (pRTR;  in  Eqs.  (29)  and  (30),  we  have  speci¬ 
fied  an  actual  value  for  these  functions  at  designated  values 
of  (P^°\T^°\P  ,T).  Upon  solution  of  Eq.  (30)  there  are  at 
least  two  different  ways  of  returning  the  full  system  to  the 
thermodynamic  state  of  interest  at  {P=P’  ,T=T’).  The  first  is 
to  collect  reweighting  samples  at  multiple  (P^°\T^),  solve 
Eq.  (30)  at  each  thermodynamic  state  to  yield  a  set  of  corre¬ 
sponding  (PopRopt)  pairs,  calculate  ensemble  averages  at 
each  new  pair  using  the  optimized  N(MC)2  procedure,  then 
interpolate  between  those  averages  to  obtain  approximate 
values  at  the  (P=P’  ,T=T’)  combinations  desired.  We  will 
take  this  approach  in  a  future  publication”6  where  N(MC)2 
will  be  used  to  characterize  the  shock  Hugoniot  locus  of  N2 
over  a  wide  range  of  thermodynamic  conditions.  Here  we 
assume  a  simpler  approach,  more  suitable  for  use  of  N(MC)2 
at  an  isolated  thermodynamic  state.  After  solving  Eq.  (30)  for 
(Apt>  PoptX  we  linearly  extrapolate  back  to  the  original,  de- 


TABLE  I.  Summary  of  optimized  N(MC)2  parameters  for  four  different 
reference  potentials  (/= 0.90- 1.05  A).  Acceptance  probabilities  Arw  are  de¬ 
fined  a  priori  by  evaluation  of  Eq.  (28)  with  reweighting  samples;  AMC  are 
calculated  a  posteriori  from  N(MC)2  simulation  at  the  specified  conditions. 
A  max  represents  Arw  following  solution  of  Eq.  (30).  All  acceptance  probabili¬ 
ties  are  listed  in  the  form  A=f(P^°\T^°\P,T).  Uncertainties  in  the  final 
digit  of  the  acceptance  probabilities  are  indicated  in  parenthesis. 


/(A) 

1.05 

1.00 

0.95 

0.90 

r  (k) 

728 

728 

728 

728 

P’  (GPa) 

4.84 

4.84 

4.84 

4.84 

Top,  (K) 

756 

796 

846 

867 

Pa P,  (GPa) 

5.08 

5.35 

5.74 

6.08 

fO)  (jQ 

4  (GPa) 

700 

660 

610 

589 

4.60 

4.33 

3.94 

3.60 

Am(p’  ,r,p'  ,r) 

0.367(5) 

0.069(3) 

0.010(1) 

0.001(0) 

Auc(P',T'.P',r) 

0.355(6) 

0.084(3) 

0.018(3) 

0.006(1) 

Amax(P  ’  ,  7’  ,  Popt ,  70pt) 

0.512(5) 

0.195(5) 

0.061(3) 

0.024(2) 

AmcCp  ’ .  t'  ,  popt ,  ropt) 

0.553(7) 

0.279(9) 

0.077(9) 

0.030(5) 

^c(<Ro0pR'.n 

0.548(6) 

0.252(7) 

0.078(6) 

0.021(5) 
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FIG.  6.  Maximization  of  the  acceptance  probability  Am,(P'  ,7'  ,P,7)  by  variation  in  full  system  pressure  and  temperature  in  Eq.  (30).  The  contours  in  the  left 
panel  show  acceptance  probabilities  for  pressures  of  4.75-5.75  GPa  and  temperatures  of  750-850  K.  The  arrow  points  in  the  direction  of  uniformly  increasing 
contour  values;  the  dotted  line  indicates  the  isotherm  containing  the  maximum  acceptance  probability.  Three  such  isotherms  are  plotted  in  the  right  panel, 
including  the  one  containing  the  Am  maximum  at  7=790  K. 


sired  (P',7")  and  apply  the  same  transformation  to  the  ref¬ 
erence  variables,  yielding  approximate  (P^,  7^), 

<1  -  P{0)  +  (P’~  Popt),  «  ^0)  +  (T'  -  ropt)  .(31) 

Optimized  N(MC)2  is  then  performed  at  (P®,  7^,7" ,  T'). 
Concrete  examples  of  this  procedure  are  shown  in  Table  I. 
Beginning  at  T=Ii0)=Tl  =  728  K  and  P=P^=P' 
=4.84  GPa,  unoptimized  N(MC)2  simulations  were  carried 
out  with  all  four  reference  system  bond  lengths  /  and  the 
resultant  AMC(P'  ,T'  ,P'  ,T')  recorded.  AMC  values  represent 
an  a  posteriori  estimate  of  A,  calculated  simply  as  the  num¬ 
ber  of  accepted  composite  steps  divided  by  the  total  number 
of  composite  trial  steps  in  a  N(MC)2  simulation.  The  refer¬ 
ence  distribution  of  SW  was  then  sampled  at  <9(104)  points, 
from  which  Arw(P’  ,T  ,P'  ,T')  was  built  using  Eq.  (28).  P 
and  T  were  varied  with  Eq.  (30)  to  yield  Amax  and  (Popt,  7'opt), 
then  Eq.  (31)  was  used  to  generate  (pf\,  7^).  Finally, 
N(MC)2  simulations  using  the  two  optimized  sets 
(P',r,Popt,Topt)  and  were  performed  to 

yield  the  corresponding  AMC.  Note  that  optimized  reference 


system  variables  were  obtained  by  solution  of  Eq.  (31),  not 
Eq.  (29);  thus,  no  a  priori  estimate  of  AMC(P®,T®,P',r') 
is  available.  Numbers  in  parenthesis  indicate  statistical  un¬ 
certainty  in  the  final  digit  recorded.  Discrepancies  of  greater 
than  one  cr  between  theoretical  and  computed  values  most 
likely  reflect  use  of  incompletely  decorrelated  samples. 

We  found  the  surface  describing  Arw  as  a  function  of  P 
and  T  (or  P(01  and  7i(l))  to  be  generally  smooth,  and  the 
scanned  range  of  pressure  and  temperature  values  can  be 
squeezed  iteratively  in  combination  with  finer  meshes  until  a 
maximum  is  located;  this  approach  proceeds  with  little  diffi¬ 
culty.  The  left  panel  in  Fig.  6  presents  a  contour  plot  of 
predicted  acceptance  probabilities  scanned  over  range  of 
750-850  K  and  4.75-5.75  GPa  for  the  reference  potential  / 
=  1.00  A.  Contour  values  were  obtained  in  the  process  of 
solving  Eq.  (30),  so  the  temperature  and  pressure  being  var¬ 
ied  are  that  of  the  full  system.  The  arrow  points  in  the  direc¬ 
tion  of  uniformly  increasing  contour  values,  and  the  vertical 
dotted  line  marks  the  temperature  at  which  Arw  is  maximal.  A 
trio  of  acceptance  probability  “isotherms”  drawn  from  the 
contour  plot  is  depicted  in  the  right  panel;  these  curves  scan 
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FIG.  7.  Convergence  of  optimized  (P=Popt,T=Tapl)  as  a  function  of  the  number  of  reweighting  points  /V. , .  For  each  value  of  {Popt,Topt)  are  found  by 
solution  of  Eq.  (30)  using  A built  from  Eq.  (28).  As  the  quality  of  the  reference  potential  increases  (/— >  1.10  A),  so  does  the  speed  of  convergence. 
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FIG.  8.  Convergence  of  the  optimal  acceptance  probability 
Aopt(P' ,  T' .  Popt .  7’i;|:i()  as  a  function  of  reweighted  sampling  points  Nrw .  A rw  is 
built  from  Eq.  (28)  and  optimized  using  Eq.  (30).  All  four  potentials  roughly 
converge  within  1000  steps,  although  convergence  becomes  slightly  oscilla¬ 
tory  as  the  potential  deviates  further  from  the  full  potential.  For  Nm 
>  1000,  the  statistical  errors  are  roughly  the  size  of  the  symbols. 


Arw  over  a  range  of  pressures  at  fixed  temperature.  The  over¬ 
all  maximum  is  clearly  identifiable  at  T=  790  K  and  P 
=  5.35  GPa. 

Solution  of  Eq.  (29)  or  Eq.  (30)  requires  sampling  at 
enough  points  to  provide  a  reliable  estimate  of  An,  from  Eq. 
(28).  If  such  estimates  require  a  large  number  of  sampling 
points,  then  the  sampling  efficiency  gained  by  optimizing 
N(MC)2  will  be  lost  in  the  overhead  of  performing  the  opti¬ 
mization  itself.  It  is  reasonable,  then,  to  ask  how  many  sam¬ 
pling  points  Nrw  are  required  to  predict  stable  values  of 
(^Wopt)  or  (P^ptXpt).  The  convergence  of  (Popt,  Topt)  with 
respect  to  Nrw  is  illustrated  in  Fig.  7,  where  it  appears  to  be 
faster  for  reference  potentials  closer  to  the  full  potential; 
while  (P0pt,Topl)  for  (=1.05  A  are  converged  at  fV,.H,=  1000, 
(T’opo  Topt)  for  /= 0.90  A  are  clearly  unconverged  even  for 
Nrw= 5000.  We  hasten  to  note,  however,  that  convergence  of 
the  acceptance  probability  is  much  more  important  than  con¬ 
vergence  of  the  thermodynamic  parameters.  If  Aopt  exhibits  a 
broad,  fiat  peak  when  expressed  as  a  function  of  P  and  T, 
then  strict  convergence  of  the  latter  two  is  not  necessary  to 
ensure  a  dramatically  improved  acceptance  probability.  Fig¬ 
ure  8  confirms  that  this  is  indeed  the  case:  the  Aopt  for  all  four 
reference  potentials  stabilize  at  roughly  1000  steps  to  the 
(Fopt,  7opt)  shown  in  Table  I. 

Having  obtained  solutions  to  Eq.  (30)  and  extrapolated 
back  to  (Pj,^)  with  Eq.  (31),  we  now  examine  the  per¬ 
formance  of  optimized  N(MC)2  using  the  new  set  of  thermo¬ 
dynamic  reference  variables.  Optimal  acceptance  probabili¬ 
ties  and  efficiencies  are  shown  in  Figs.  9  and  10  and  should 
be  compared  directly  to  those  of  Figs.  4  and  5,  respectively 
(note  that  the  ordinate  scales  in  Figs.  5  and  10  differ).  Im¬ 
provements  in  acceptance  probability  as  a  percentage  of  the 
unoptimized  values  for  0=  100-3000  are  shown  in  Table  II. 
Improvement  is  significant  for  all  potentials  at  all  values  of 
O,  but  the  marginal  gain  increases  as  O  grows  larger  and  (in 
general)  as  the  reference  potential  deviates  more  strongly 
from  the  full  potential.  The  greatest  performance  improve¬ 
ments  are  for  /= 0.95  A,  possibly  indicating  that  already  at 
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FIG.  9.  Average  acceptance  probabilities  a  for  the  optimized  N(MC)2  pro¬ 
cedure  as  a  function  of  the  number  of  reference  steps  O  taken  between  full 
energy  evaluations.  Results  should  be  compared  with  the  those  of  the  unop¬ 
timized  procedure,  shown  in  Fig.  4. 


/= 0.90  A  the  physics  embodied  by  the  reference  potential 
starts  to  deviate  too  strongly  from  that  of  the  full  potential 
for  the  optimization  procedure  to  be  fully  effective. 

The  distribution  SU  =  U-  Uiiy>  of  potential  energy  differ¬ 
ences  sampled  by  trial  states  in  N(MC)2  interpolates  between 
the  distribution  found  by  Metropolis  sampling  on  the  basis  of 
the  reference  or  full  potential  alone:  for  small  O,  the  distri¬ 
bution  of  trial  SU  is  roughly  that  encountered  in  sampling  on 
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FIG.  10.  Sampling  efficiency  Es  for  the  optimized  N(MC)2  procedure  as  a 
function  of  the  number  of  reference  steps  O  taken  between  full  energy 
evaluations.  Results  should  be  compared  to  those  of  the  unoptimized  proce¬ 
dure,  shown  in  Fig.  5. 
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TABLE  II.  Improvements  in  efficiency  for  optimized  relative  to  unopti¬ 
mized  N(MC)2,  as  a  percentage  of  efficiency  for  the  latter  and  as  a  function 
of  the  reference  system  Markov  chain  length  O. 


0 

(=1.05  A 

/=  l.oo  A 

1=0.95  A 

(=0.90  A 

100 

17 

35 

31 

20 

250 

26 

67 

100 

60 

500 

31 

89 

167 

175 

1000 

51 

179 

471 

350 

2000 

56 

210 

473 

533 

3000 

54 

182 

359 

250 

the  basis  of  U  only;  in  the  Ocolr  limit,  the  SU  distribution 
corresponds  exactly  to  that  found  in  sampling  strictly  on  the 
basis  of  t/(0)  (in  this  case,  the  trial  state  loses  its  “memory” 
of  the  initial  state).  Because  trial  states  are  assigned  Boltz¬ 
mann  weight  in  the  full  system,  the  acceptance  probability  of 
N(MC)2  steps  is  reflected  indirectly  in  the  overlap  of  the  SU 
distributions  for  trial  and  accepted  states  (both  distributions 
are  a  function  of  volume).  Figure  11  illustrates  Su  [where 
u  =  (U-Um)/N ]  versus  v  (v  =  V/N)  for  /=  1 .05  A  and  O 
=  1000.  Unoptimized  N(MC)2  results  are  shown  in  the  left 
panel  and  optimized  results  in  the  right.  The  overlap  of  the 
trial  (black  points)  and  accepted  [gray  (red)  points]  distribu¬ 
tions  increases  significantly  upon  optimization,  where  we 
have  indicated  the  values  of  (P0,  T0,P,  T)  used  in  the  under¬ 
lying  simulations.  Figure  12  illustrates  the  same  data,  but  for 
/= 0.90  A  and  (9=1000.  Again,  optimization  increases  the 
overlap  considerably.  The  density  of  accepted  points  is  much 
lower  than  in  Fig.  11,  in  keeping  with  the  acceptance  prob¬ 
abilities  listed  in  Table  II.  The  system  exhibits  also  a  strong 
tendency  to  become  “trapped”  at  certain  volumes,  indicated 
by  the  broken,  vertical  collections  of  trial  points.  As  stated  in 
Sec.  IV,  the  acceptance  probability  for  N(MC)2  steps  remains 
good  even  with  a  poor  reference  potential  if  one  is  willing  to 
employ  a  smaller  O.  Figure  13  shows  the  same  distributions 
as  in  Fig.  12,  but  for  (9  =  250  instead  of  (9=1000.  Not  only 
are  the  overlaps  between  trial  and  accepted  distributions 
greater  before  and  after  optimization,  but  the  volume  trap¬ 
ping  noted  in  connection  with  Fig.  12  is  absent  almost  en¬ 
tirely.  There  is  considerably  more  skew  in  the  trial  distribu- 
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FIG.  12.  (Color  online)  Distribution  of  energy  difference  per  particle  (Su 
**(U—lr®)/ff)  vs  volume  per  particle  ( v  =  VIN )  for  the  unoptimized  (left) 
and  optimized  (right)  N(MC)2  procedure  at  /= 0.90  A  and  offset  0=1000. 
The  black  data  represent  trial  composite  steps;  the  gray  (red)  data  are  ac¬ 
cepted  composite  steps;  accepted  points  have  been  enlarged  slightly  relative 
to  trial  points  in  order  to  enhance  contrast.  The  AW  appearing  in  Eq.  (23)  are 
found  from  the  difference  in  trial  and  accepted  points.  Note  the  change  in 
overlap  of  the  distributions  upon  optimization. 

tion  for  smaller  offsets;  this  reflects  correlation  between 
initial  and  final  points  in  the  composite  trial  step.  Although 
this  combination  of  reference  potential  and  (9  yields  high 
overlap  and  thus  a  good  acceptance  probability,  many  more 
sampling  points  will  be  required  to  minimize  the  variance  in 
ensemble  averages. 

From  the  sets  of  trial  and  accepted  points  shown  in  Figs. 
11  and  12,  one  can  build  the  AW  distributions  appearing  in 
Eq.  (23),  thus  establishing  a  direct  link  between  trial  distri¬ 
butions  and  acceptance  probabilities.  Figure  14  depicts  the 
AW  distributions  built  from  Figs.  11  (left  panel)  and  Fig.  12 
(right  panel).  As  expected,  the  distributions  for  the  better 
reference  potential  (/=  1.05  A)  are  narrower  and  their  means 
lie  closer  to  zero.  Optimization  substantially  lowers  the  ab¬ 
solute  value  of  ((AW))  and  cr(AW)  for  both  values  of  l  (we 
have  indicated  the  double  averaging  over  initial  and  final 
points  by  nested  brackets),  and  both  of  these  factors  contrib¬ 
ute  to  a  higher  average  acceptance  probability  for  composite 
moves.  This  exercise  provides  an  alternate  means  of  evalu¬ 
ating  the  performance  of  the  optimization  procedure,  in  that 
the  first  two  moments  of  the  AW  distribution  are  closely 
related  to  the  acceptance  probability.  Table  III  provides  an 
overview  of  these  moments  (before  and  after  optimization) 
for  all  of  the  potentials  surveyed. 


FIG.  11.  (Color  online)  Distribution  of  energy  difference  per  particle  (Su 
=  (((/-  I/®)  /  AO)  vs  volume  per  particle  ( v  =  V/N )  for  the  unoptimized 
(left)  and  optimized  (right)  N(MC)2  procedure  at  /=  1.05  A  and  offset  O 
=  1000.  The  black  data  represent  trial  composite  steps;  the  gray  (red)  data 
are  accepted  composite  steps.  The  A W  appearing  in  Eq.  (23)  are  found  from 
the  difference  in  trial  and  accepted  points.  Note  the  increase  in  overlap  of 
these  distributions  upon  optimization,  and  thus  the  decrease  in  ((AW)). 
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FIG.  13.  (Color  online)  Distribution  of  energy  difference  per  particle  (Su)  vs 
volume  for  the  unoptimized  (left)  and  optimized  (right)  N(MC)2  procedures 
at  /=0.90  A  and  offset  0  =  250.  The  black  data  represent  trial  composite 
steps;  the  gray  (red)  data  are  accepted  composite  steps.  The  AW  appearing 
in  Eq.  (23)  are  found  from  the  difference  in  trial  and  accepted  points.  Even 
for  a  poor  reference  potential,  significant  acceptance  probabilities  can  be 
achieved  by  lowering  O  (cf.  Fig.  12).  Note  that  for  lower  values  of  O,  the 
correlation  of  Su  and  v  remains  even  in  the  trial  distribution. 
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FIG.  14.  Direct  visualization  of  the  AW  appearing  in  Eq.  (23),  taken  as  the  difference  in  SW  for  the  trial  and  accepted  distributions  shown  in  Fig.  11  (left 
panel)  and  Fig.  12  (right  panel).  Upon  optimization,  the  mean  value  of  AW  shifts  closer  to  zero  and  the  width  of  the  distribution  shrinks  in  both  cases.  Both 
of  these  factors  contribute  to  a  higher  acceptance  probability  for  composite  moves. 


VI.  SUMMARY 

Nested  Markov  chain  MC  (N(MC)2)  allows  (MC)2  sam¬ 
pling  with  a  potential  of  given  accuracy  (the  full  potential) 
without  needing  to  evaluate  it  following  every  elementary 
step.  By  stringing  together  a  sequence  of  single -particle 
moves  accepted  on  the  basis  of  a  more  approximate  potential 

TABLE  III.  First  two  moments  in  the  distribution  of  AW  appearing  in  Eq. 
(23),  as  a  function  of  the  reference  system  Markov  chain  length  O  and  for  all 
of  the  reference  potentials  surveyed.  The  double  brackets  indicate  averaging 
over  initial  and  final  points  [i  and  j  in  Eq.  (27)],  and  values  are  listed  before 
and  after  (opt)  optimization. 


«AW» 

«AW»opt 

cr(AW) 

<r„pt(AW) 

/=  1.05  A 

0=100 

-0.27 

-0.08 

1.88 

1.32 

0=250 

-0.59 

-0.19 

1.95 

1.34 

0=500 

-0.86 

-0.35 

2.08 

1.37 

0=1000 

-1.38 

-0.51 

2.34 

1.43 

0=  2000 

-1.59 

-0.63 

2.46 

1.47 

0  =  3000 

-1.68 

-0.71 

2.53 

1.77 

/=  1.00  A 

0=100 

-1.08 

-0.31 

3.84 

2.72 

0=250 

-2.27 

-0.73 

4.46 

2.80 

0=500 

-3.27 

-1.33 

4.98 

3.01 

0=1000 

-5.10 

-1.89 

6.32 

3.23 

0=2000 

-5.89 

-2.40 

6.96 

3.61 

0=3000 

-5.99 

-2.60 

6.99 

3.67 

Z=0.95  A 

0  =  100 

-1.92 

-0.67 

5.70 

4.47 

0=250 

-4.46 

-1.55 

7.34 

4.83 

0=500 

-7.09 

-2.85 

9.18 

5.41 

0  =  1000 

-10.72 

-4.02 

12.02 

6.26 

0=  2000 

-11.58 

-4.98 

12.68 

6.88 

0=3000 

-11.36 

-5.11 

12.42 

6.89 

1=0.90  A 

0=100 

-2.70 

-0.82 

7.69 

6.13 

0=250 

-6.17 

-1.80 

9.40 

7.19 

0  =  500 

-12.46 

-3.95 

14.80 

9.44 

0=1000 

-17.25 

-5.11 

18.96 

11.02 

0  =  2000 

-16.66 

-5.53 

17.83 

10.82 

0  =  3000 

-18.05 

-7.39 

19.33 

11.68 

(the  reference  system),  trial  steps  in  the  full  system  are  made 
to  encompass  multiple  particles.  The  acceptance  probability 
of  this  composite  step  can  be  maximized  in  at  least  two  dif¬ 
ferent  ways.  The  first,  described  above,  involves  manipula¬ 
tion  of  the  thermodynamic  conditions  characterizing  the  ref¬ 
erence  system  such  that  the  variance  of  the  SU  versus  V 
distribution'11  is  minimal.  A  second  means  of  optimization, 
not  explored  here,  is  direct,  iterative  modification  of  the  ref¬ 
erence  potential  to  conform  with  “targets”  (such  as  average 
energies  or  volumes)  computed  with  the  full  potential. 
Changes  could  be  made  adaptively  to  the  reference  potential 
functional  form  or  its  parametrization  or  (in  the  case  of  an  AI 
reference  potential)  the  basis  set  or  level  of  convergence.  An 
iterative  strategy  similar  to  that  used  in  optimal  pulse 
shaping'  or  empirical  stmcture  refinement'1  could  be  em¬ 
ployed  in  combination  with  the  thermodynamic  optimization 
described  above  to  yield  an  even  more  efficient  route  to  ac¬ 
curate  equilibrium  sampling. 

Clearly  there  are  limits  to  the  range  of  reference  poten¬ 
tials  suitable  to  any  given  full  potential.  Qualitative  differ¬ 
ences  in  the  nature  of  intermolecular  forces,  such  as  the  com¬ 
plete  absence  of  attractions,  cannot  be  salvaged  using  the 
optimized  N(MC)2  procedure;  hard  spheres  will  never  be  a 
good  reference  for  AI  water.  Once  a  pair  of  potentials  is 
different  enough  that  the  overlap  of  their  SU  versus  V  distri¬ 
butions  is  nearly  zero,  collecting  statistics  to  evaluate  Eq. 
(27)  does  become  quite  difficult.  These  difficulties  were  ap¬ 
parent  even  in  the  /= 0.90  A  case  above,  in  spite  of  the  fact 
that  our  reference  and  full  potentials  have  the  same  func¬ 
tional  form.  Bennett’s  methods'9  for  estimating  AW  surely 
are  useful  in  this  context,  but  it  is  unlikely  that  even  optimal 
performance  will  be  acceptable  if  such  techniques  are  re¬ 
quired  simply  to  estimate  A  (i.e.,  thermodynamic  optimiza¬ 
tion  can  only  move  the  distribution  so  far).  In  cases  of  irre¬ 
mediably  small  overlap,  one  should  probably  opt  for  a 
different  set  of  potentials. 
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